Skip to content

DOC: update the DataFrame.count docstring #20221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 12, 2018

Conversation

joders
Copy link
Contributor

@joders joders commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
###################### Docstring (pandas.DataFrame.count) ######################
################################################################################

Count non-NA cells for each column or row.

Return Series with number of non-NA observations over requested
axis. Works with non-floating point data as well (detects `NaN` and
`None`)

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    If equal 0 or 'index' counts are generated for each column.
    If equal 1 or 'columns' counts are generated for each row.
level : int or str, optional
    If the axis is a `MultiIndex` (hierarchical), count along a
    particular level, collapsing into a `DataFrame`.
    A `str` specifies the level name.
numeric_only : boolean, default False
    Include only `float`, `int` or `boolean` data.

Returns
-------
Series or DataFrame
    For each column/row the number of non-NA/null entries.
    If level is specified returns a `DataFrame`.

See Also
--------
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
    elements)
DataFrame.isnull: boolean same-sized DataFrame showing places of NA
    elements

Examples
--------
>>> df=pd.DataFrame({ "Person":["John","Myla",None],
...                   "Age":[24.,np.nan,21.],
...                   "Single":[False,True,True]     })
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2    None  21.0    True
>>> df.count()
Person    2
Age       2
Single    3
dtype: int64
>>> df.count(axis=1)
0    3
1    2
2    2
dtype: int64

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.count" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.


Return Series with number of non-NA observations over requested
axis. Works with non-floating point data as well (detects `NaN` and
`None`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add NaT here (that's our missing value for datetime data)

level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a DataFrame
If equal 0 or 'index' counts are generated for each column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove "equal" from this line and the next.


Examples
--------
>>> df=pd.DataFrame({ "Person":["John","Myla",None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pep8 on this example. space around =, no space after {, space after :, space after ,.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an example with level=? I think you could

  1. Make the dataframe 2 items longer and repeat John an Myla.
  2. Update the df output and df.count examples
  3. show df.set_index(['Person', 'Single']).count(level='Person')

Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
elements)
DataFrame.isnull: boolean same-sized DataFrame showing places of NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to isna instead

2 None 21.0 True
3 John 33.0 True
4 Myla 26.0 False
>>> df.count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line between cases

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you paste the output of the doc validation script again?


Return Series with number of non-NA observations over requested
axis. Works with non-floating point data as well (detects `None`,
`NaN` and `NaT`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End with a .

If 1 or 'columns' counts are generated for each **row**.
level : int or str, optional
If the axis is a `MultiIndex` (hierarchical), count along a
particular level, collapsing into a `DataFrame`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backticks around the `level` parameter.

@joders
Copy link
Contributor Author

joders commented Mar 11, 2018

################################################################################
###################### Docstring (pandas.DataFrame.count) ######################
################################################################################

Count non-NA cells for each column or row.

Return Series with number of non-NA observations over requested
axis. Works with non-floating point data as well (detects `None`,
`NaN` and `NaT`).

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    If 0 or 'index' counts are generated for each column.
    If 1 or 'columns' counts are generated for each **row**.
level : int or str, optional
    If the axis is a `MultiIndex` (hierarchical), count along a
    particular `level`, collapsing into a `DataFrame`.
    A `str` specifies the level name.
numeric_only : boolean, default False
    Include only `float`, `int` or `boolean` data.

Returns
-------
Series or DataFrame
    For each column/row the number of non-NA/null entries.
    If `level` is specified returns a `DataFrame`.

See Also
--------
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
    elements)
DataFrame.isna: boolean same-sized DataFrame showing places of NA
    elements

Examples
--------
Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", None, "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2    None  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    4
Age       4
Single    5
dtype: int64

Counts for each **row**:

>>> df.count(axis='columns')
0    3
1    2
2    2
3    3
4    3
dtype: int64

Counts for one level of a `MultiIndex`:

>>> df.set_index(["Person", "Single"]).count(level="Person")
        Age
Person
John      2
Myla      1

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.count" correct. :)

axis. Works with non-floating point data as well (detects NaN and None)
Count non-NA cells for each column or row.

Return Series with number of non-NA observations over requested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change, maybe remove the first sentence since this can return a DataFrame with level.

I think just use the extended summary to say what counts as non-null data.

The values None, NaN, NaT, and optionally np.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the first sentence in the extended summary, i.e. :
"Return Series with number of non-NA observations over requested axis."

If I understand you right I would change the entire summary (i.e. short and extended summary) to look like the following:

        Count non-NA cells for each column or row.

        The values None, NaN, NaT, and optionally np.inf (depending on
        pandas.options.mode.use_inf_as_na) are considered NA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. np.inf to `numpy.inf` and single backticks around pandas.options.mode.use_inf_as_na.


>>> df = pd.DataFrame({"Person":
... ["John", "Myla", None, "John", "Myla"],
... "Age": [24., np.nan, 21., 33, 26],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: indendt one more space. smae with line below.

Copy link
Contributor Author

@joders joders Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me flake complains if I change that. on my system flake doesn't check the examples, so I copy it in the code:

        df = pd.DataFrame({"Person":
                           ["John", "Myla", None, "John", "Myla"],
                           "Age": [24., np.nan, 21., 33, 26],
                           "Single": [False, True, True, True, False]})
        df

If I have it like it like this flake only complains about the pd not being defined:
pandas/core/frame.py:5672:14: F821 undefined name 'pd'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misread.

@joders
Copy link
Contributor Author

joders commented Mar 12, 2018

################################################################################
###################### Docstring (pandas.DataFrame.count) ######################
################################################################################

Count non-NA cells for each column or row.

The values `None`, `NaN`, `NaT`, and optionally `numpy.inf` (depending
on `pandas.options.mode.use_inf_as_na`) are considered NA.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    If 0 or 'index' counts are generated for each column.
    If 1 or 'columns' counts are generated for each **row**.
level : int or str, optional
    If the axis is a `MultiIndex` (hierarchical), count along a
    particular `level`, collapsing into a `DataFrame`.
    A `str` specifies the level name.
numeric_only : boolean, default False
    Include only `float`, `int` or `boolean` data.

Returns
-------
Series or DataFrame
    For each column/row the number of non-NA/null entries.
    If `level` is specified returns a `DataFrame`.

See Also
--------
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
    elements)
DataFrame.isna: boolean same-sized DataFrame showing places of NA
    elements

Examples
--------
Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", None, "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2    None  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    4
Age       4
Single    5
dtype: int64

Counts for each **row**:

>>> df.count(axis='columns')
0    3
1    2
2    2
3    3
4    3
dtype: int64

Counts for one level of a `MultiIndex`:

>>> df.set_index(["Person", "Single"]).count(level="Person")
        Age
Person
John      2
Myla      1

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.count" correct. :)

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 12, 2018
@TomAugspurger TomAugspurger merged commit 0596cb1 into pandas-dev:master Mar 12, 2018
@TomAugspurger
Copy link
Contributor

Thanks @joders!

@joders
Copy link
Contributor Author

joders commented Mar 13, 2018

thanks for providing pandas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants